Search CORE

19 research outputs found

Learning To Scale Up Search-Driven Data Integration

Author: Yan Zhepeng
Publication venue: ScholarlyCommons
Publication date: 01/01/2016
Field of study

A recent movement to tackle the long-standing data integration problem is a compositional and iterative approach, termed “pay-as-you-go” data integration. Under this model, the objective is to immediately support queries over “partly integrated” data, and to enable the user community to drive integration of the data that relate to their actual information needs. Over time, data will be gradually integrated. While the pay-as-you-go vision has been well-articulated for some time, only recently have we begun to understand how it can be manifested into a system implementation. One branch of this effort has focused on enabling queries through keyword search-driven data integration, in which users pose queries over partly integrated data encoded as a graph, receive ranked answers generated from data and metadata that is linked at query-time, and provide feedback on those answers. From this user feedback, the system learns to repair bad schema matches or record links. Many real world issues of uncertainty and diversity in search-driven integration remain open. Such tasks in search-driven integration require a combination of human guidance and machine learning. The challenge is how to make maximal use of limited human input. This thesis develops three methods to scale up search-driven integration, through learning from expert feedback: (1) active learning techniques to repair links from small amounts of user feedback; (2) collaborative learning techniques to combine users’ conflicting feedback; and (3) debugging techniques to identify where data experts could best improve integration quality. We implement these methods within the Q System, a prototype of search-driven integration, and validate their effectiveness over real-world datasets

ScholarlyCommons@Penn

Room-temperature photoluminescence mediated by sulfur vacancies in 2D molybdenum disulfide

Author: Chhowalla Manish
Eda Goki
Gauriot Nicolas
Hoye Robert LZ
Li Yang
Lim Juhwan
Phuyal Dibya
Ramsden Hugh
Rao Akshay
Sarkar Soumya
Wang Yan
Yan Han
Zhang Zhepeng
Zhu Yiru
Publication venue: American Chemical Society
Publication date: 07/07/2023
Field of study

Atomic defects in monolayer transition metal dichalcogenides (TMDs) such as chalcogen vacancies significantly affect their properties. In this work, we provide a reproducible and facile strategy to rationally induce chalcogen vacancies in monolayer MoS2 by annealing at 600 °C in an argon/hydrogen (95%/5%) atmosphere. Synchrotron X-ray photoelectron spectroscopy shows that a Mo 3d5/2 core peak at 230.1 eV emerges in the annealed MoS2 associated with nonstoichiometric MoSx (0 < x < 2), and Raman spectroscopy shows an enhancement of the ∼380 cm–1 peak that is attributed to sulfur vacancies. At sulfur vacancy densities of ∼1.8 × 1014 cm–2, we observe a defect peak at ∼1.72 eV (referred to as LXD) at room temperature in the photoluminescence (PL) spectrum. The LXD peak is attributed to excitons trapped at defect-induced in-gap states and is typically observed only at low temperatures (≤77 K). Time-resolved PL measurements reveal that the lifetime of defect-mediated LXD emission is longer than that of band edge excitons, both at room and low temperatures (∼2.44 ns at 8 K). The LXD peak can be suppressed by annealing the defective MoS2 in sulfur vapor, which indicates that it is possible to passivate the vacancies. Our results provide insights into how excitonic and defect-mediated PL emissions in MoS2 are influenced by sulfur vacancies at room and low temperatures

Oxford University Research Archive

Sulfonate-modified phenylboronic acid-rich nanoparticles as a novel mucoadhesive drug delivery system for vaginal administration of protein therapeutics: improved stability, mucin-dependent release and effective intravaginal placement

Author: ChunYan Li
LiQian Ci
WeiYue Lu
XueYing Yan
Yu Liu
ZhePeng Liu
ZheShuo Liu
ZhiGang Huang
Publication venue: 'Dove Medical Press Ltd.'
Publication date
Field of study

Crossref

Recommended from our members

Deconstructing Urea Fertilizer Price Spikes: The Role of Supply-Demand, Speculation, and Energy Prices

Author: Hu Zhepeng
Huang Joshua
Yan Lei
Yuan Jinghong
Publication venue
Publication date: 21/06/2023
Field of study

AgEcon Search: Research in Agricultural and Applied Economics

Deconstructing Urea Fertilizer Price Spikes: The Role of Supply-Demand, Speculation, and Energy Prices

Author: Hu Zhepeng
Huang Joshua
Yan Lei
Yuan Jinghong
Publication venue
Publication date: 21/06/2023
Field of study

AgEcon Search - Research in Agricultural & Applied Economics

Probabilistic String Similarity Joins

Author: Feifei Li
Jeffrey Jestes
Ke Yi
Zhepeng Yan
Publication venue
Publication date: 01/01/2010
Field of study

Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific computing have to deal with fuzzy information in string attributes. Despite the intensive efforts devoted in processing (deterministic) string joins and managing probabilistic data respectively, modeling and processing probabilistic strings is still a largely unexplored territory. This work studies the string join problem in probabilistic string databases, using the expected edit distance (EED) as the similarity measure. We first discuss two probabilistic string models to capture the fuzziness in string values in real-world applications. The string-level model is complete, but may be expensive to represent and process. The character-level model has a much more succinct representation when uncertainty in strings only exists at certain positions. Since computing the EED between two probabilistic strings is prohibitively expensive, we have designed efficient and effective pruning techniques that can be easily implemented in existing relational database engines for both models. Extensive experiments on real data have demonstrated order-of-magnitude improvements of our approaches over the baseline

CiteSeerX

Crossref

Effect of 3-Mercaptopropyltriethoxysilane Modified Illite on the Reinforcement of SBR

Author: Hao Zhang
Qiang Liu
Shaojuan Wang
Shouke Yan
Zhepeng Wang
Publication venue: MDPI AG
Publication date: 01/05/2022
Field of study

To achieve the sustainable development of the rubber industry, the substitute of carbon black, the most widely used but non-renewable filler produced from petroleum, has been considered one of the most effective ways. The naturally occurring illite with higher aspect ratio can be easily obtained in large amounts at lower cost and with lower energy consumption. Therefore, the expansion of its application in advanced materials is of great significance. To explore their potential use as an additive for reinforcing rubber, styrene butadiene rubber (SBR) composites with illites of different size with and without 3-mercaptopropyltriethoxysilane (KH580) modification were studied. It was found that the modification of illite by KH580 increases the K-illite/SBR interaction, and thus improves the dispersion of K-illite in the SBR matrix. The better dispersion of smaller size K-illite with stronger interfacial interaction improves the mechanical properties of SBR remarkably, by an increment of about nine times the tensile strength and more than ten times the modulus. These results demonstrate, except for the evident effect of particle size, the great importance of filler–rubber interaction on the performance of SBR composites. This may be of great significance for the potential wide use of the abundant naturally occurring illite as substitute filler for the rubber industry

Directory of Open Access Journals

PubMed Central

Germinal disc region: an appropriate source for obtaining maternal DNA from eggs

Author: Du Yu
Liu Ruifang
Meng Guohua
Wang Zhepeng
Yan Changliang
Zhang Guoqiang
Publication venue: University of Toronto
Publication date: 09/02/2017
Field of study

Eggs may serve as an alternative source for DNA extraction. The quality of DNA extracted from eggshell, whole egg liquid (WEL) and germinal disc region (GDR) was compared based on the spectrophotometric, electrophoretic, PCR and reduced-representation library sequencing (RRLS) results. Although these DNAs were all invisible on the gel and can not be measured spectrophotometrically, the GDR DNA was superior to the eggshell and WEL DNA in PCR efficiency. After the whole genome amplification (WGA) was introduced, the yield of GDR DNA was significantly increased. The obtaining DNA had overwhelming superiority over the eggshell and WEL DNA in the ratio of captured genome and the number of called SNP. The GDR DNA extraction followed by the WGA provides a method to obtain sufficient DNA from a single egg.The accepted manuscript in pdf format is listed with the files at the bottom of this page. The presentation of the authors' names and (or) special characters in the title of the manuscript may differ slightly between what is listed on this page and what is listed in the pdf file of the accepted manuscript; that in the pdf file of the accepted manuscript is what was submitted by the author

University of Toronto Research Repository

Crossref

Active learning in keyword search-based data integration

Author: Ives Zachary G
Talukdar Partha Pratim
Yan Zhepeng
Yu Cong
Zheng Nan
Publication venue: SPRINGER
Publication date
Field of study

The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration-where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to learn how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few ``top-'' results: This result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper, we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains

Open Access Repository of IISc Research Publications